Goto

Collaborating Authors

 challenge 2020


ALO-VC: Any-to-any Low-latency One-shot Voice Conversion

arXiv.org Artificial Intelligence

This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method. ALO-VC enables any-to-any voice conversion using only one utterance from the target speaker, with only 47.5 ms future look-ahead. The proposed hybrid signal processing and machine learning pipeline combines a pre-trained speaker encoder, a pitch predictor to predict the converted speech's prosody, and positional encoding to convey the phoneme's location information. We introduce two system versions: ALO-VC-R, which uses a pre-trained d-vector speaker encoder, and ALO-VC-E, which improves performance using the ECAPA-TDNN speaker encoder. The experimental results demonstrate both ALO-VC-R and ALO-VC-E can achieve comparable performance to non-causal baseline systems on the VCTK dataset and two out-of-domain datasets. Furthermore, both proposed systems can be deployed on a single CPU core with 55 ms latency and 0.78 real-time factor. Our demo is available online.


Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

arXiv.org Artificial Intelligence

In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications. One of the challenges in this context is to create non-existent voices that sound as natural as possible. In this work, we propose to tackle this issue by generating speaker embeddings using a generative adversarial network with Wasserstein distance as cost function. By incorporating these artificial embeddings into a speech-to-text-to-speech pipeline, we outperform previous approaches in terms of privacy and utility. According to standard objective metrics and human evaluation, our approach generates intelligible and content-preserving yet privacy-protecting versions of the original recordings.


COVID-19: Call for Code Global Challenge 2020 Techiewave

#artificialintelligence

The 2020 Call for Code Global Challenge has expanded its focus to tackle the effects of COVID-19. Technology solutions can help reduce the impact this pandemic has on our daily lives and the world. COVID-19, which is caused by the novel corona virus, has revealed the limits of the systems we take for granted in a very short period of time. Whether it's the massive increase in demand for information during a time of crisis, educating children when schools are closed, or helping communities best distribute limited resources, technology has a pivotal role to play. Through Call for Code, you can see your idea deployed by a global partner ecosystem.